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Variance Dynamics - An empirical journey 
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We investigate the joint dynamies of spot and implied volatility from an empirieal perspeetive. We foeus on the equity 
market with the SPX Index our underlying of ehoiee. Using only observable quantities, we extraet the instantaneous varianee 
eurves implied by the market and study their daily variations jointly with spot returns. We analyze the eharaeteristies of 
their individual and joint densities, quantify the non-linear relationship between spot and volatility, and diseuss the modeling 
implieations on the implied leverage and the volatility elustering effeets. We show that non-linearities have little impaet on 
the dynamies of at-the-money volatilities, but ean have a signifieant effeet on the prieing and hedging of volatility derivatives. 


1 Introduction 


Equity implied volatility, as priced by the market through 
vanilla options and volatility derivatives, is certainly not a 
constant. Its behaviour is strongly linked to its underlying, 
often appearing negatively-correlated with spot returns. Yet, 
at times, it exhibits spurs of independance, behaves capri¬ 
ciously, and displays a life of its own. Understanding the 
behaviour of volatility, be it for the purpose of risk manage¬ 
ment or the pricing and hedging of derivatives, is crucial to 
most market participants; unexpected moves can prove costly. 

And the task is not easy. The meaning of implied volatility 
is rich as it inherently refers to multiple connected concepts; 
e.g. an at-the-money (ATM) implied volatility observed for 
a specific expiry on the entire volatility surface, or the full 
term-structure of variance. Its dynamics are complex and 
give birth to a range of distinctive regimes m- It has led 
practitioners to define sets of rules to identify and trade 
around specific volatility patterns; among those, the concept 
of sticky-strike and sticky-delta m, or the shadow gamma. 


Over the years, a host of volatility models have been in¬ 
troduced to better understand its complex behaviour. Those 
fall broadly into a few representative classes, of which pure 
stochastic volatility models (e.g. SABR HU, Heston [18] , vari¬ 
ance curve models BW and spot-only-driven models (e.g. 


GARCH models jS] [T4|) are probably the best known and 
most used. Recent advances have shown that the dynamics 
they generate are intrinsically constrained by the class they 
belong to mill uni 0121]. For instance, Bergomi demonstrates 
in [4] that the dynamics of ATM implied volatilities generated 
by a stochastic volatility model are inherently linked to the 
smile produced by the model. Therefore, a volatility model 
often dictates more than the obvious, and should always be 
selected based on a clear understanding of the properties one 
wishes to model. 


In this work, we study the joint dynamics of spot and implied 
volatility from an empirical perspective. Our journey into the 
volatility lanscape is pragmatic. We analyze the properties 
of volatility using as few assumptions as possible. Our aim 
is to identify and quantify the meaningful patterns of spot 
and implied volatility, and study their implications on the 
modeling of volatility. We proceed as follow. 

We extract using only observable quantities the joint vari¬ 
ations of the underlying market with the term-structure of 
implied variance (sect. |^. Although direct observation of 
the term-structure is not possible, estimation with minimal 
distortion is achieved through the use of a general stochastic 
volatility framework (sect. 2.3). 

We then review step by step the model assumptions and dis¬ 
cuss the limitations of our approach (sect. 3.1). The in-depth 
analysis serves as the basis to explore important properties 
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of the spot/vol dynamics. We probe the characteristics of 
individual and joint densities and quantify the non-linear 
relationship between spot and volatility. We find that the 
implied leverage effect (i.e. the tendancy of atm volatility to 
increase as the underlying market decreases) and the volatil¬ 
ity clustering effect (i.e. the propensity of volatility to stick 
to recent past levels, also referred to as the heterocedasticity 
of volatility) are well-captured by the combination of non¬ 
linear and non-Gaussian properties (sect. |3.2| and 3.3). We 
then study the mean-reverting nature of volatility of volatil¬ 
ity; although our goal is not to introduce another volatility 
model, we suggest some potential venues for improvements 
(sect. [0| ). 

In the last part, we gauge the impact of non-linearities on the 
pricing, modeling, and hedging of derivatives. We find them 
to have little influence on the dynamics of ATM volatilities: 
on equity indices, the linear spot/vol correlation remains the 
dominant factor (sect, [rlj and |4.2[ ). However, convex effects 
change the realised volatility of annualised variance, thereby 
impacting the pricing and hedging of volatility derivatives 
(sect. |4.3[ ) . Sectionconcludes. 


2 Extracting short-term dynamics 

Focusing on short-term dynamics, we extract from observable 
quantities the daily variations of an asset St and of the term- 
structure of its implied volatility for u > t. Our observables 
are listed futures on spot and on implied volatility. Futures 
on implied volatility are a rich source of information on the 
term-structure of volatility, as long as one is careful enough to 
correct for the small convexity adjustment present in those. 
In this paper, our example of choice is the SPX index and its 
corresponding volatility metric the VIX inde:>|^ 

2.1 Dataset and notations 

The VIX index is a real-time measure of the market’s implied 
variance of the SPX index over the next 30-calendar-days. For 
each trading time t, several futures are quoted. We denote 
them by where is the expiration date - by an obvious 
extension, V/ represents the VIX index. These futures span a 
term-structure of several months, providing an observable but 
indirect measure of the implied variance priced by the market. 

Although the trading in VIX futures began on March 26, 
2004, liquidity remained low until 2008. The credit cri¬ 
sis changed the whole landscape. With volatility jumping 


suddenly to unexpected highs, more and more market par¬ 
ticipants started to envision volatility as a potential hedge 
for their portfolio. In the years that followed, VIX trading 
increased significantly. Since 2012, approximately fifty thou¬ 
sand futures contracts trade on a daily basis on the first and 
second expiries (see Fig. [^; medium-term futures, of approx- 
imatively 6-month maturity, can be traded with a reasonable 
liquidity, .e.g approximately five-thousand contracts per day. 



Figure 1: Liquidity of VIX Futures Each curve represents 
in log-scale the average daily volume of the traded futures. 
The daily volumes have been averaged with an exponential ker¬ 
nel with half-life of one month. 

Our dataset comprises of the VIX index and of the VIX fu¬ 
tures from the end of 2007. We exclude the early days of VIX 
trading; the lack of liquidity and the inconsistency of the 
quotes cannot be trusted. We focus on daily close-to-close 
variations. Consequently, our dataset contains more than 
1500 daily observations, each observation point consisting of 
the VIX index and 7 futures. 

We now introduce our notations. The risk-neutral market 
measure is denoted by E^[.]^ or simply by Et[.] when there 
is no ambiguity. The real-but-unknown measure is E^[.] that 
we approximate by the historical measure. We denote by a^6t 
the variance realised by the spot process St during times t 
and t St, and var^^^^^ total variance realised during 

Ti and T 2 

var^^^^^ = '^^a^Su ^ f a^du. 

Ti 

We make a clear distinction between an achievable finite 
sampling, denoted by St, and its theoritical limit, represent¬ 
ing an infinitely-small instantaneous sampling, denoted by 
dt and used profusely in stochastic calculus. The continuous 


^The term-structure of variauce could also be extracted from the quotes of vauilla optious usiug the well-kuowu replicatiou of variauce. How¬ 
ever, as attractive as it souuds, this approach is uot trivial: it uecessiates a complete optiou database aud it also raises some uou-trivial modeliug 
questious, such as how to iuterpolate iu time (betweeu expiries) aud iu space (betweeu strikes), or how to haudle missiug or iucorrect data. 
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formalism has the great advantage of simplifying proofs and 
equations, but can sometimes hide subtle sampling effects. 
The different integrals appearing in the text should often 
be interpreted as discrete sums over the discrete sampling 
periocQ Because we are working with daily observations 
corresponding to trading bussiness days, the sampling fre¬ 
quency is by convention set to 252, i.e. Note 

that the variance is not known at time t, as it depends on 
the future return ^ between t and t^6t. 

The time-t term-structure of variance 
directly observable, but will be deducted from the values 
of VIX futures. Because the instantaneous implied variance 
is a martingale under the risk-neutral measure [3], we also 
have This martingale property is the basis of 

numerous stochastic volatility models la 0 nB ; We also 
follow that choice, because of the generality and flexibility of 
the resulting volatility models. 


We denote by the fair-value at time t of the an¬ 

nualised variance defined over the time interval [Ti,T 2 ]: 




= Et 



var 


Ti^Tsj _ 



releavant). The second quirk comes from the fact that the 
addition (new trade) or deletion (trade closing) of some out- 
of-the-money options can lead to a sudden jump of the VIX 
value. Fortunately, VIX futures do not suffer from those arte¬ 
facts, as they are only an expectation of the future VIX index. 

With the previous notations, we can finally express the value 
at time t of a VIX future expiring at maturity Ti: 



2.2 Spot model 

The modeling of the spot returns is rather natural. We model 
the dynamics of the return ^ as a stochastic realisation 
of the instantaneous implied variance over the finite time 
interval St. Working with futures written on the SPX index, 
the spot model is defined as follow: 



where dZt is a centered random variable of variance dt. Con¬ 
sequently, the equality = Et[a^] = is naturally 

enforced. 


When t < Ti, the forward-starting variance strike is 

equal to 

By definition, the level of the VIX index, also referred to 
as VIX cash or VIX spot, calculated at time Ti should be 

equal to , where T 2 = Ti + In practice, the 

VIX is not exactly the 30-day implied volatility. First, it is 
necessary to take into account the correct number of returns 
in the next 30-day period by scaling the observed levels by 

a factor In onr dataset, we adjust each obser¬ 

vation, i.e. the values of VIX index and VIX futures, by the 
correct factor without mentioning it explicitly. Second, it is 
a well-known fact that the VIX index, being computed as a 
linear interpolation between two incomplete strips of options, 
is only an imperfect proxy of variance. In particular, the VIX 
index exhibits two quirks, which can sometimes be misinter¬ 
preted for a real volatility impact. The first one is linked to 
the roll mechanism of the listed options, which can cause an 
artificial change in the VIX level (the historical methodology 
meant that 8 days before the expiry, the selected options 
roll from the and 2^^ expiries to the 2'^^ and 3^^^; with 
the emergence of weekly options, this issue has become less 


Although the variable -^SZt is centered and normalized 
to unity under the risk-neutral measure, it is not necessarily 
in practice. We denote by /i^ and az the annualised trend 
and volatility of the stochastic process Zf under 93: 

IJzSt = Ef[SZt] and alSt = E^[{5Zt - iizStf] 


This discrepancy between market view and realised variance 
is the basis of the volatility risk premium : the predictive 
power of the implied instantaneous variance is quite poor, 
and for most times, or equivalently < 1 (see 

section 


3.2). 


Finally, note that the variable SZf does not need to be Gaus¬ 
sian, allowing the modeling of the discrete nature of the 
returns. We denote by C = E^[SZ^] and k = E^ISZ"^] — 3 its 
skew and excess kurtosis. 


2.3 Stochastic variance model 

Our goal is to extract the factors driving the term-structure 
of variance as accurately as possible and without having 
to rely significantly on any volatility model. However, as can 
be seen from Eq. a VIX future provides only an indirect 


^That being said, we also approximate discrete sums by their equivalent integrals when this makes sense. For instance, sums such as 

would be approximated by their integral counterpart e~'^du = ^ -, valid as long as St is small or equivalently 

N large. As we just said, the integral approximations have the advantage of greatly simplifying the equations. 
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measure of the implied variance curve between times Ti and 
T 2 . The lens of observation is the risk-neutral market pricing, 
which must be modeled in order for us to reach back the 
underlying variable Therefore, we must select a volatility 
model capable of pricing VIX futures. 

Selecting a volatility model allows us to derive, for each ob¬ 
servation date t, an accurate estimate of the term-structure of 
volatility implied by observable VIX futures . This is 
achieved through a pricing equation (Eq. that defines the 
convexity correction inherent to VIX futures. The convexity 
adjustment is a function of the model parameters, which are 
estimated from the daily variations of VIX futures (through 
Eq. . Although distinct volatility models would lead to dif¬ 
ferent convexity adjustments, the differences would be small 
and would not change the analysis. In practice, the magni¬ 
tude of the convexity correction is so small that the convexity 
adjustment is almost inconsequential for our purpose. Once 
calibration is achieved, we will be in a position to revisit and 
discuss some of our modeling assumptions (sections]^ and [^. 



Jan /12 Jan/13 


Eigure 2: Instantaneous Variance Curves We graph some 
estimated instantaneous varianee eurves (in blaek) over the 
period 2011 — 2012, as well as the eorresponding magnitude of 
the eonvexity eorreetion in shaded gray below eaeh eurve. The 
VIX eash index is displayed in red and the underlying market 
in blue. 

2.3.1 Description of the volatility model 

The rather-unpredictable nature of volatility is usually mod¬ 
eled using stochastic model^ Quite generally, the instanta¬ 
neous term-structure of variance with t < u is assumed to 
be driven by a set of n Brownian motions: 

n 

= Cx'E ( 3 ) 

CX = 1 


with correlation matrix Ca,g = ^ (^dW^^dW^ 


f)=Pa,g^ 


Although the stochastic variance model is expressed in a 
continuous setting, it is really a discrete framework that is 
described by the above equation - most often than not, the 
infinitesimal term d should be understood as a finite variation 
6. The above framework, which is built on the martingale 
properties of implied variance, is quite general and flexible. 
Our choice of using a log-normal model was motivated by re¬ 
cent studies [16] , as well as its popularity. Working at a finite 
time scale, non-linearities can easily be taken into account by 
simply introducing non-linear relationships between volatility 


factors and spot returns (see section 3.3). 


We denote by 0 the diagonal matrix with diagonal terms 
6a and Q the covariance matrix defined by = ^a^gpa,i3 
- the coefficients 9^ represent the volatility of the a factors. 
The instantaneous volatility n'f of the instantaneous variance 
is maturity-dependant and equal to: 

K = ( 4 ) 

V 

The weighting functions might depend on the curve ft 
and time, but not on the spot St - this is the choice followed 
in HIHIEI]- Quite often, they are chosen as time-invariant 
decreasing functions, i.e. uj^{t,u,ft) = — t^ft)^ express¬ 

ing the fact that a random shock at time t impacts the whole 
term-structure of variance with a magnitude uj^{u — t^ft) 
decreasing with maturity u — t. Erequently, the functions are 
defined as exponentials uj'^{t^u^ft) = exp (—/cq,(i 4 — t)). As 
a result, the stochastic model becomes Markovian and can 
be integrated exactly in closed-form [3| . Although we do not 
need these explicit properties, the additive separability of 
exponentials greatly simplifies the analysis. We follow that 
choice in our numerical simulations. 


Equations and defines a general stochastic spot-vol 
model. When = 0, the model reduces to a simple Black- 
Scholes (BS) model with a deterministic, spot-independant, 
diffusion variance defined by \/u > t, fu ~ case, 

the Black-Scholes volatility is also the variance-swap volatil¬ 
ity (jysit^T) = the volatility smile being obviously 

flat. 

In practice, a small number of driving factors is sufficient to 
accurately capture the dynamics of variance. Cont and da 
Eonseca show that the 3 principal modes represents 98% of 
the variance of the daily curve deformations [T2|. The first 


^The use of GARCH models would not be appropriate in our case, as they do not possess any volatility factors. They ignore the independant 
nature of volatility, which is exactly what we aim to model. As such, they would be too restrictive for our purpose. It is interesting to note 
that by specifying the stochastic volatility factors as deterministic functionals of the spot factor, some GARCH models can be interpreted as 
reduced-versions of more-general stochastic volatility models - see section 13.3.11 
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mode, amounting to 80% , can be interpreted as a level effect, 
whereas the second and third modes correspond respectively 
to slope and convexity. Our data set does not capture the 
long-end of the variance curve since every observation is lim¬ 
ited to the first seven expiries, just above half a year. As 
such, using a large number of factors might lead to overfit¬ 
ting and parameter instabilities. For that reason, we follow 
the methodology introduced by Bergomi in [3] and Gatheral 
in [16], and select two factors only (note that we also inves¬ 
tigated the use of a three-factor model, but did not observe 
any significant improvement - see discussion insection 


3.1). 


From Eq. and we can derive the time-t value of a VIX 
future , as well as its variation . In particular, a VIX 
future is always belovQ the corresponding forward-starting 
variance 


2.3.2 Fitting the volatility model 

The stochastic model is specified by the set of parameters 
5, which comprises of the number of Wiener processes 
driving the instantaneous variance (set to two in this work), 
of the shape of the kernel functions (defined by the param¬ 
eters /cq,), and of the corresponding covariance structure 

Our volatility model, which aims at capturing the short-term 
variations of VIX futures perfect and will fail to 

match perfectly the variations of the entire term-structure. 
This is partly due to the inadequacy and simplicity of our 
model, but not only. Due the complexity of volatility dynam¬ 
ics, any volatility model would fail at some levels and some 
matching errors will always be present. Those would be also 
enhanced (and sometimes caused) by the illiquidity of some 
VIX futures and the inacuraries of their quotes. 





The convexity adjustment comes from the concavity of the 
square-root function and is proportional to the vol of vol pa¬ 
rameters 


Introducing the notation = \J^ 

we finally obtain the set of equations that we will use through¬ 
out the paper (see appendix 6.1 for the derivation steps): 


V? 


C.C. 


dvy 

V? 


Pricing Equation 

= X (1 “ convexity correction) 


(5) 


= E 




8 


a,j3 

Model Dynamics 


ka T ^/3 




( 6 ) 




= Et( 


Ti ,a 


V? 


-fdwr 


In the remainder of this section, we describe the calibration 
of our volatility model and the extraction of the volatility 
factors. The reader uninterested by the technical details can 
safely jump to section 


These matching errors must be accounted for. To do so, we 
follow a standard approach and introduce for each variation 

a measurement error-term ril. The measurement term 

is modeled as Gaussian noise with volatility cr^’* inversely 
proportional to the current liquidity of the corresponding 
VIX futures. By doing so, we force the variations of the most 
liquid futures, i.e. the ones that are traded the most, to be 
better modelled by our volatility model than less-liquid (e.g. 
longer- term) future^ Consequently, a future’s variation is 
the sum of a model-term and an error-term: 


sv: 


V 


_ n 


L,i 


X Vi' 


error-term 


model-term 


(7) 


We introduce the additional notations: 

- Dt is 8i diagonal matrix Dt{i^i) = 

- Ml is the matrix defined by Mt{i,a) = Y 

- 6Wt is the Gaussian column-vector with components 6W^ 

- Ut is the normalized Gaussian Ut = -^^Til~^5Wi with 

TrI a lower triangular matrix such that TrI x Trl"^ = C 
(i.e. Cholesky’s decomposition). Working with Ut or SWt is 
equivalent, but Ut has the property of having uncorrelated 
normalized-to-unity components. 


^Although VIX futures should always quote below their corresponding forward-variance level, it is not always the case in practice. Dislocations 
do appear from time to time. However, those are extremely hard to capture, as bid-offers render the arbitrage impossible. Those dislocations are 
not frequent, and would not change the results of our analysis. 

^Note that our dataset also includes VIX index levels that do not possess any liquidity since the VIX index is not tradeable. In addition, we 
have seen that the VIX index is more prone to unacceptable variations (by construction). To alleviate these issues, we do not use the the VIX index 
variations to calibrate our model parameter (in Eq. |^. However, we do use its value to extract more accurately the term-structure of variance 
from Eq. [^ 
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This allows us to recast the above equation in matrix form 
as: 


we find that the log of the integral is 

proportional (ignoring useless constant term) to: 


Vt 


= Mt X Q X TrI xUt X ^Dt x m 

^ -V-^ 

Qx6Wt 


( 8 ) 


At this point we are ready to rephrase our calibration proce¬ 
dure into a Bayesian framework. One can express the proba¬ 
bility of our observations as p(..., Vt,... |S) and extract the 
model parameters by maximum-likelihood: 

S* = argmaxp(..., Vt,... |S) 


Although maximum-likelihood have notorious convergence 
problems (often due to the presence of numerous local min¬ 
ima) , we did not experience such issue with two factors - how¬ 
ever, with three factors, numerous optimizations from ran¬ 
domly selected random points had to be run. The joint prob¬ 
ability can be decomposed as a product of independant prob¬ 
abilities: 




= n / (9) 

r ’Jswt 


Traditionally, the last integral cannot be integrated di¬ 
rectly and one usually resorts to an iterative expectation- 
maximization algorithm thanks to a conventional Jensen ar¬ 
gument. Fortunately for us, integrating Eq.j^does not present 
any difficulty since the joint density p(^, can be 

expressed as a simple product of Gaussian multivariates 


P{ 


Vt 


\6Wu^u^)xp{6Wt\E) 

" -V-" 


- log \Dt\ - log |Id + DpMtVndt\ 

+ (m + MJ... 

Xt 

X (Id + Vif MjDpMtVndt)-'^... 

X (/i + MjDp^-^yfdt) 



where p = E^[Ut] = E^[U]. Although the risk-neutral ex¬ 
pectation of the factors is zero (since the instantaneous 
variance is a martingale under the risk neutral measure), 
nothing guarantees that this property should also hold under 
the real-but-unknown measure. It is actually a well-known 
fact that the implied variance has realized negative decay, 

i.e. E^[6Wt] < O 5 he. giving rise to the well-known term- 
structure volatility risk premia. 


Solving for the model parameters is now straightforward. 
Because the matrices Mt depend on the estimated curves 
which themselves depend on the set of parameters S, we 
must proceed iteratively in pseudo expectation-maximization 
fashion: 

1 . Modeling step First, given a fully specified model (i.e. 
a full set of parameters S), the instantaneous variance 
term-structure Q can be extracted from Eq. for each 
time t. Without any assumption on the curve om 
problem would not be tractable. We assume that the 
variance term-structure is smootlj^ and parameterize 
each variance curve by a small number of basis functions 
capturing most of the variability of the curve. Eigurej^ 
graphs some examples of estimated curves. 

2 . Expeetation step Once the curves have been estimated, 

the integrals appearing in Eq. (or equivalently 

the matrix Mt in Eq. can be computed. 


Using the well-known identity for the integration of a multi¬ 
variate variable x of dimension n 

[ exp(—-x~^Ax + J^x)dx = ^ '^\ exp[-J] 

7-00 2 ^ |yL|i ^^2 ^ 


3. Maximisation step We are then in a position to de¬ 
termine the unknown parameters by simple maximum- 
likelihood. In practice, to avoid convergence problems, 
the kernel parameters are iteratively selected on a 
spanning grid, being kept fixed while the remaining pa¬ 
rameters are estimated by maximum-likelihood. 


® A more realistic modelling of the term-structure of variance would only assume piece-wise smooth functions, with discontinuities happening at 
important financial dates (e.g. Federal Reserve board meetings, release of key indicators). This might prove crucial for stocks, but less for global 
indices. Note that instead of decomposing the curve onto a small set of basis functions, we could have instead added to the pricing equation]^ 
a Tychonoff regularization term. We experimented with both approaches and did not notice any significant differences. 
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We iterate the above steps untill convergence has been 
achieved. On our dataset, the optimal set of parameters is 
provided in the below table. 


kp 

ks 

Pf 


Op 

Os 

P 

10.25 

1.05 

-7.5% 

-0.4% 

180% 

92% 

51% 


Bergomi’s two factor model has similar parameters. However, 
in our calibration, the slow factor ks is significantly higher 
(approximately 1 to be compared with 0.35 in [3]), mainly 
due to the fact that our model is focused on short-to-medium- 
term maturities (there is no need to model the long-tend of 
the variance curve). 

Once the optimization is finished, we extract the hidden 
states SWt by maximizing the posterior: 

(xp{^-^\SWu^u^) X p{5Wt\E) 

yt yt 

The states 5Wt are the driving factors of implied volatility in 
our general stochastic volatility framework. They are a sim¬ 
plified representation of the implied volatility term-structure 
variations. 



Figure 3: Optimization We graph the optimized estimated 
parameters as a funetion of the seleeted fast faetor kp- The 
minimum of the log-likelihood (in dotted red) is aehieved for a 
fast faetor kp ~ 10.25. For different values of kp, the other 
parameters do not vary signifieantly. For instanee, the esti¬ 
mated eorrelation and the first varianee parameter Op, 
are quite stable around 0.5 and 1.8 respeetively. The slow 
faetor ks, and the seeond varianee parameter Os are slowly 
inereasing as the parameter kp is inereased. 


2.3.3 Convergence and Stability 

Convergence is achieved in a couple of iterations. At each 
iteration, the expectation step might incur some errors (e.g. 
due for instance to incorrect model parameters). Those are 
unlikely to cause any significant inaccuracies in the estimation 


of the integrals - mainly because the convexity correc¬ 

tion magnitude is quite small and could almost be ignored 
- see section |2X4l Figure displays some of the estimated 
instantaneous variance curves along with the corresponding 
convexity corrections. 

The accurate identification of the kernel functions is more 
difficult. Whereas it is clear that two distinct modes exist, 
i.e. one fast kp > b and one slow ks < 1.5, the log-likelihood 
of different and apparently-acceptable solutions do not always 
differ significantly. For a wide range of acceptable solutions, 
encompassing the range b < kp < lA and 0.3 < ks < 1.4, 
the remaining parameters 0p,0s,p are quite stable. Figure 
illustrates this point. 


2.3.4 Orders of magnitude 

We provide some ballpark numbers on the magnitude of 
the convexity correction. To do so, we make the com¬ 
mon assumption of a relatively-fiat term-structure of vari¬ 
ance. For instance, focusing on the interval [Ti,T 2 ], this 
assumption means that the difference between the instan¬ 
taneous variance fff for u G [Ti,T 2 ] and the annualized 
variance is negligible compared to the variance it¬ 
self . Mathematically, this is often formulated as 

yu G [Ti,T 2 ], We will use this 

assumption several times throughout this work. 

The integrals can be approximated at first-order by 

with g{x) = ^ e~^du = 

and AT = T 2 — Ti = From there, the convexity adjust¬ 
ment of Eq. 1^ can also be approximated: 


^a,fS g{ka^T) g{k/sAT) 

g^AT) gp{AT) 

which converges to a limit of 4 ^ n^^^go,{AT)g^p{AT) ^ 

long-term expiries. However, note that the limit for large ma¬ 
turities should not be trusted. Our model has been calibrated 
on short-term to medium-term expiries, and the simplicity of 
our model would not be appropriate to evaluate the longer- 
end of the curve (over 6-months). 

Plugging our model parameters, we find that the convex¬ 
ity adjustment is quite small for short maturities. For the 
first two futures, the convexity is less than 5%, and could 
almost be neglected; the limiting convexity adjustment for 
long-term maturities is smaller than 10%. 


c.c. 


E 

a,/3 


\ _ ^-{koc+k^){Tx-t) 
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order, we can assimilate the value of a VIX future to its 
corresponding forward-variance strike, i.e. ~ [K^h We 

immediately deduct that the volatility of a VIX future can 
be approximated by: 


1 

2 




( 10 ) 


which is slightly lower than half the volatility of variance 
as defined in eq. 


3 Analysis and Discussion 

At this stage, our volatility model has been calibrated, the 
model factors 5Zt and as well as the daily variance 

curves have been estimated. Before proceeding any fur¬ 
ther, we check that our model can be trusted for the purpose 
of better understanding volatility. We verify that it accu¬ 
rately captures the main modes of variance curve deforma¬ 
tions (section [3A] ). This check validates the use of hidden 
states 5Wt to analyse the properties of volatility in a simpler 
low-dimensional setting. The volatility factors constitute a 
simplified representation of the term-structure variations. 

We are then in a position to explore the joint dynamics 
of spot and volatility. We proceed step by step. We probe 
the characteristics of the individual densities of the model 
factors, discuss the validity of the Gaussian assumption and 
the implications on the magnitude of the volatility risk pre¬ 
mia (section [ 3 ^ . We then focus on the joint variations and 
model the non-linear relationship between spot and volatility 
(section [33| ). We highlight the link with GARCH models, and 
study the impact on the implied leverage effect and the hete- 
rocedasticity of volatility. Finally, we delve into the volatility 
of volatility itself (section [3.4[ ). The mean-reverting nature of 
the VVIX index suggests some possible improvements. 


3.1 Model Adequacy 

To evaluate the adequacy of the stochastic variance model 
with historical data, we conduct two elementary experiments. 


To begin, we compare the theoritical term-structure of the 
VIX volatility (given by our calibrated model in Eq. 10) with 
historical realized levels. As figure illustrates the match is 
satisfactory. The limit for short-maturities, as ^ 0, 

can be computed around 90%, a value slightly below the ob¬ 
served volatility of the VIX index (at approximately 110% 
since 2007). Figure]^ also shows that short-term volatility is 
more volatile and display more skewness and kurtosis than 
the long-end of the curve (see section 3.2). 


VIX Future's Daily Volatility as a function of Time to Expiry 
4.- 



Time to Expiry in Calendar Days 


Figure 4: Instantaneous VIX volatility Each cross corre¬ 
sponds to the realized daily volatility of a VIX futures plotted 
as a function of its time to expiry. For each given maturity, 
the circle represents the quadratic average of all correspond¬ 
ing daily volatilities. The curve represents the model volatility 
computed as in Eq. 


In a second experiment, we compute from the set of calibrated 
curves ft Ihe principal orthogonal modes of the curve vari¬ 
ations ^ (i.e. Karhunen-Loeve decomposition). The first 
three principal eigenmodes captures more than 99% of the 
total variance. Figure displays the corresponding modes. 
The first mode, covering almost 95% of the total variance, 
corresponds to a level effect, whereby the whole curve deforms 
subject to an implied volatility shock. The deformation is not 
uniform, but affects more the short-term portion of the curve, 
reflecting the higher volatility of short-term futures. As de¬ 
scribed by Gout and da Fonseca in m, the second and third 
eigenmodes can be identified to slope and convexity effects. 

On this data set, the convexity effect provided by the third 
eigenmode is negligeable as its contribution to the total vari¬ 
ance is less than 1%. This is explained by the short time span 
provided by the first 7 VIX futures. Two principal modes are 
sufficient to capture 99% of the deformation modes. This 
explains why using more than two factors in our model cali¬ 
bration might lead to a difficult optimization and overfitting. 
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To illustrate the adequacy of our simple model with the data 
set, we compute the two orthogonal modes implied by our 
stochastic model. Figure displays the modes implied by 
the data (i.e. from estimated variance cuves) and the model. 
The match between the model and the data is surprisingly 
good, except maybe for very short-term maturities (less up to 
2 weeks), where calibrated variance curves appear “flatter” 
than the exponential model modes. This might be due to 
the smoothing constraint that we introduced for the term- 
structure of variance. The emergence of short-term VIX 
futures, introduced in 2014 by CBOE, will progressively alle¬ 
viate this issue. 



Figure 5: Market and Model Modes The modes implied 
by the data and the model are display in solid and dotted line 
respeetively. In both eases, the first mode represents 95% of 
the total varianee. The seeond mode eorresponds to a slope 
term, whereas the third mode eaptures eonvexity. Two modes 
are suffieient to aeeurately eaptures 99% of the varianee. 

3.2 The Gaussian is not the ^^normal” 

We now turn our attention to the statistical properties of the 
stochastic factors 5Zt and dW^ , and of the variations of the 
instantaneous variance We compute some elementary 

St 

statistics over the whole period of study. 


X 

hx 


Cx 

Kx 



SZt 

+33% 

79.6% 

-0.57 

1.59 

5.23 

3.76 

~swr 

-117% 

100% 

+0.36 

4.25 

3.25 

3.35 

~lwf~ 

-68% 

100% 

+0.43 

2.62 

2.92 

3.92 

_Si_ 

-67% 

210% 

+0.63 

3.60 

3.17 

3.12 


As expected, the equity factor 5Zt exhibits significant nega¬ 
tive skewness and an excess kurtosis, found to be around 1.5 
and inline with numerous previous studies (see [7]). The small 
amount of samples should however make us feel suspicious 
of any definite conclusion. Although there is enough data to 


prove the existence of significant excess kurtosis and fat tail, 
there is certainly not enough to calibrate with confidence the 
corresponding values of n. More to the point, estimated tail 
coefficients in the 3 — 4 range raise clearly the question of 
convergence and of the finiteness of the kurtosis. We ignore 
this potential issue, and only conclude from the above num¬ 
bers that the gaussian assumption is clearly violated. 

The volatility is positively skewed, with a much larger kur¬ 
tosis (with the fast factor being more extreme than the slow 
factor). This does not come as a surprise as volatility’s be¬ 
haviour is notoriously wild. The tail coefficients, computed 
from extreme values at the 2% (below) and 98% (above) quan¬ 
tiles, are also representative of the underlying distributions. 
Positive skew C ~ +0.5, significant excess kurtosis k > 3, and 
small positive tail coefficient ^ 3 clearly indicates that 
implied volatility can behave capriciously, even more as the 
maturity decreases. 

3.2.1 Beware the volatility carry 

Those statistics also highlight two of the most common strate¬ 
gies in the volatility space: the short volatility carry and the 
implied term-structure carry m- Both strategies consist in 
selling volatility. They aim at collecting small but regular 
positive returns by observing a dangerous short volatility po¬ 
sition, thereby playing against the rare but devastating risk 
event of implied volatility spiking. 

• Short Volatility Carry The short volatility aims at 
harvesting the well known volatility risk premium, i.e. 
the difference between implied and realized volatilities, 
by going short realized volatility against the long im¬ 
plied premium. On the current dataset, the volatility 
risk premium can be estimated to be around 1 — cr| « 
36% (obviously excluding transaction costs). This is 
quite large, and explains why the volatility risk pre¬ 
mium is so popular. However, the volatility of the 
risk premium can be roughly approximated around 
V^2%^ cr| ^ 120%, by no-means insignificant. 

• Implied Term-Structure Carry The implied term- 
structure strategy exploits the negative carry present in 
the volatility term-structure. The instantaneous vari¬ 
ance curve is usually in contango, reflecting the greater 
uncertainty associated with further maturities. On this 
dataset, the term-structure premium is characterised by 
the negative trends of volatility factors, with magnitude 
above 50%! The iPath S&P 500 VIX Short-Term Fu¬ 
tures ETN, which systematically rolls a long position in 
short-term VIX futures, constitutes probably the most 
archetypical example of the cost of carry. 
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Although the above statistics have been computed over the 
whole period, it should be clear that they are not constant 
through time. For instance, the ratio of realised over implied 
volatilities averages at a value around gz ~ 80%, but has 
reached over the considered period very large values. The 
distribution of the square spot factor is representative 

of the danger of the short volatility position. 


sum of two terms, a non-linear dependancy on the spot factor 
and an exogeneous factor modeled by a Gaussian U^\ 

5Wr=fa{5Zt)+^aUr, ( 11 ) 

where the function is chosen as a quadratic function 
fa{SZt) = aa{SZf - 1) - baSZf 


Predicting the evolution of a volatility risk premia is diffi¬ 
cult. In practice, very few practitioners have been successful 
over the long run. Yet, despite these obvious dangers, a 
host of volatility strategies, implementing sophisticated rule- 
based strategies which are supposedly able to anticipate risk 
reversals, have recently surfaced and received surprisingly 
popularity... until the next crisis? 


3.3 Joint Densities and Non-linearities 

We now study the characteristics of the joint variations. In 
addition to the negative correlation between spot and volatil¬ 
ity, we expect non-linearities to be present. Volatility tends 
to react linearly to shocks up to a certain point after which 
(i.e. below which) volatility tends to spike rapidly. 


We define the centered and normalized variables 


= SZt- ixzSt 

- 

azVSt 


and 




More complex relationship could have been introduced, but 

we found this simple quadratic functional to capture well 

the non-linear dependancy of volatility on the spot. It is 

interesting to observe that by working at a finite time scale, 

non-linearities can be easily introduced. This would not be 

the case with an infinitesimal modeling, as the quadratic vari- 
- 2 

ations dZt = 1 would reduce the functional fa to a simple 
linear dependancy. 


Straightforward computations leads to: 

^ _ E[5Wri5Zf - CSZt)] 

“ " 2+K-e 

, _ CE[5Wr5Zf] - (2 + K)E[SWr5Zt] 

2 + K-e 

7a = 1 - «a(2 + K)-bl + 2aa5aC, 

P = lalpElU^U^] + (2 + K)aaai3 

T^Ck;^/3 


-E[6Wf^6Zt], 


and display their joint variations in Figure As we can ob¬ 
serve, there exists a small non-linear relationship that is more 
pronounced for the fast factor. 



-4 -2 0 2 4 -4 -2 0 2 4 


Figure 6: Joint Dynamics The fast (left) and slow (right) 
volatility faetors SW^ are plotted against the spot faetor SZf. 
We also graph the result of non-linear modeling through the 
funetional relationship fa • The estimated parameters are pro¬ 
vided. 

We suggest to model the daily variations of volatility as the 


The exogenous variables are not strongly correlated. Be¬ 
sides, visual inspection seems to indicate independance from 
the spot variable SZt - note that we have E[U^5Zt\ = 0 by 
construction. The distance correlation m between SW(^ and 
SZt is around 70%, but falls below 10% between and SZt. 

Of the two volatility factors SW^, only the fast one requires 
a quadratic term. The contribution of the exogenous factor, 
as measured by 7 q,, is significant in both cases, with a larger 
impact for the fast factor 7 f > 75- Short-term volatility is 
more wild and less predicatable than longer-term volatility. 
The use of a third and faster factor, i.e. k > kp, would have 
generated a slightly higher convexity, but not by significant 
amount. 

Going one-step further, we deduct from Equation that the 
variance curve deformations can also be modeled as the sum 
of two independant terms, one quadratic functional of the 
spot factor SZt, and an additional independant component: 

a 
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where Vt is a standard normal variable and cry(rt, a 
maturity-dependant volatility: 


with Vt a standard Gaussian variable and parameters verify¬ 
ing: 




] 0a0i3OJ°‘{u, t, t, ^tha'll3E[U°‘UI^] 


The magnitude of the volatility ay should draw attention to 
the rather unpredictable nature of implied volatility. The ex¬ 
ogenous factor contributes to approximately a third of the 
total volatility. Although it is certainly true that the dynam¬ 
ics of volatility is strongly linked to its underlying, reducing 
it to a simple functional relationship would greatly underes¬ 
timate the subtle behavior of volatility. This naturally leads 
us to discuss some limitations of GARCH models. 



3.3.1 Link with GARCH models 


(fit = fila where 9^ = 9ae~'‘°‘^^ 

4>l = fi:Jc.{2aa‘^ym+^)V5i 
4>t = 

+ Ea 9a{aa^fifii + baf^)St 

4>t = + 


The difference between Eq. 13 and 14 comes from which 


variance is modeled; in the former case, it is a measure of the 
true realized variance E^[cr^], whereas in the later it corre¬ 
sponds to the implied market view = E^[cr|]. However, as 
we can expect market expectations to provide a reasonable 
estimate of true hidden distribution, both approaches should 
be equivalent at first-order, with the discrepancy being taken 
into account in Eq. by the presence of the normalizing 
variance factor cr|. 


Galibration of Eq. on the current data set shows a good 
alignment with model values (where we have averaged over 
the time-dependancy): 


GARCH models are very popular in financial economet¬ 
rics. They generally assumes that the conditional volatility 
(Jq^= E^[a^] is only driven by a single source of risk present 
in the spot variations SZf. Those models do not attempt to 
model the implied volatility, but focus instead on the “true” 
variance of future spot returns. As such, they voluntarily 
ignore the independant nature of volatility. Many variants 
exist [14], but, in a nutshell, they attempt to model the next 
step variance as a function of past variances cfq 

and past returns Vt-i for i > 0. 

A typical assymetric GARCH(1,1) model would be expressed 
as: 


^2 


2 

I — + (1 + P^)(Jq ^ + residues (13) 
ot ’ 




where the constant 0* would be calibrated on the historical 
time series. The modeling equation 11 with its quadratic 
terms hints at a potential GARCH model. One can develop 


equation 12 at first-order to find the following expression: 


St+(5 


2 


Va 







r 

Data 

0.15% 

1.53% 

0.15% 

-3.13% 

127% 

Model 

0.00% 

0.96% 

0.15% 

-1.50% 

141% 


The quadratic term is larger when directly calibrated on 
the data set than computed from model parameters. This is 
partly due to the implied leverage and the volatility cluster¬ 
ing effects that we analyse in the next section. In stochastic 
model with a term-structure of volatility, a shock SZt at time 
t has an impact on the entire variance curve. This impact is 
reflected in the slope of the term-structure, and indirectly in 


and the approximation provided by Eq. are near-sighted 
in the sense that they ignore longer-term past contributions 
and attempt to explain variance changes through past returns 
only. 


the coefficient On the other end, the modeling in Eq. 


13 


Stochastic volatility model with term-structure of implied 
variance, such as the current model of Eq. are a good 
imitation of market realities. In particular, long range phe¬ 
nomena are more easily captu red thanks to the modeling of 
term-structure (see sect. 3.3.2), than for GARCH model^ 


^Long-term dependancies can obviously be introduced in GARCH models, but calibration can then prove difficult as the number of degrees of 
freedom increases. 
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3.3.2 Stylized Facts 


similar derivation steps, we find that: 


Despite being multifaceted, spot and volatility exhibit a few 
stable features, which are often referred to as ‘stylized facts’ in 
the literature; among those, the negative steepness of the im¬ 
plied volatility smile (reflecting the negative spot/vol correla¬ 
tion), the heterocedasticity of volatility (i.e. the phenomenon 
of volatility clustering El), and the implied leverage effect 
(i.e. the tendency of ATM volatility to increase as the under¬ 
lying market goes down p!Q]b 

Having modeled the non-linear spot/vol relationship, we 
study the modeling implication on the implied leverage and 
the volatility clustering effects. Both relates today’s news to 
tomorrow’s volatility. The implied leverage effect quantifies 
the impact of a shock today onto tomorrow’s volatility (sign 
to amplitude), whereas the volatility clustering relates to¬ 
day’s volatility to tomorrow’s (amplitude to amplitude). In 
our model, the straightforward link is obviously provided by 
the term-structure of volatility: a news today has an impact 
on the full term-structure of variance, thereby impacting to¬ 
morrow’s volatility. 

We denote by X the detrending of the stochastic variable 
X, i.e. X = X — E[X]. The leverage correlation function 
£(t. A) measures the dependancy between a (detrended) re¬ 
turn ft observed today at time t and tomorrow’s volatility 
^t+A iiieasured at time t -1- A. Straightforward computation 
by iterative conditioning on the filtrations at times t + A and 
t + St, followed by the use of Eq. (see [21]) leads to: 


Et 

= y2eaUJ^{t,t + A,^t)E[SZtSWnVM 

CicxC bcx 


Therefore, in the case of equities, the main driver of the 
leverage correlation function is the spot/vol correlation 
E[SZtSW^] ^ —ba- Non-linearities are negligeable. 

The volatility clustering function C(t, A) measures the corre¬ 
lation between today’s volatility computed at time t and 
tomorrow’s volatility computed at time t A. Using 


C{t,A) 



C{t,A) 



W t + A, Ct) E[dZ^dW^]VH 

-V-^ 

00,(2 + /^)— 6q;C 


In the case of volatility clustering, the complete non-linear re¬ 
lationships between the spot returns and the volatility factors 
must be taken into account. It is interesting to note that if we 
had assumed a standard normal modeling, i.e. assuming the 
spot factor SZt to be Gaussian by enforcing C = 0 and /^ = 0, 
we would not have been able to match the data as accurately. 
The volatility clustering is as much a result of non-linearities 
(contributing a third, through the term aa{‘2 + /^)) as of non- 
Gaussian effects (contributing two thirds, through the term 
baC) . 

Leverage Correlation Function Volatility Clustering Function 




Figure 7: Leverage Correlation & volatility Clustering 

Figure displays the leverage correlation function and the 
volatility clustering function estimated from our model and 
the data set. The augmented stochastic model captures well 
both stylized facts. 


3.4 Towards a term-structure of vol of vol 

Although spot returns rt = ^ display little autocorrelation 
at the scale of a da}j^ this is not the case of the return mag¬ 
nitudes, measured as \rt\ or r/. This phenomenon is known 
as the volatility clustering effect that we studied in details in 
the previous section. 

In our model, the variables SZt and SW^ are assumed to 
be independant and identically distributed. As such, they 
should not show any autocorrelation. This is verified by the 
spot factors SZt, their magnitudes \SZt\, as well as by the 
volatility factors SW^. However, the magnitudes of volatility 


small negative auto-correlation seems to exist for some equity indices, giving rise to mean-reversion strategies. 
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factors are not independant and a positive autocorre¬ 

lation exists. A simple autocorrelation check over the whole 
term-structure confirms the finding: the absolute variations 
of the curve observed at different maturities are also autocor- 
related (a similar check can be conducted on the variations 
of VIX futures). 

This autocorrelation property hints at a similar behaviour 
for the variations of volatility ^ as for the spot returns 
Vt : in the same way that a shock today has a lasting im¬ 
pact on the future market volatility, a volatility shock today 
also impacts the future volatility of volatility. As the former 
translates into a term-structure of implied-volatility, a similar 
term-structure of volatility of volatility also exists. 

To better understand the behaviour of volatility of volatility, 
we look into the VVIX index. Similarly to the VIX index, 
which represents the expected volatility of the SPX index 
over the next 30 calendar days, the VVIX index reflects the 
expected volatility of the VIX index. In the same vein, it is 
computed as an interpolation of VIX options, which are writ¬ 
ten on VIX futures. Because VIX futures react differently to 
the arrival of new information based on their time to expiry, 
the situation is slightly different than for the VIX index. The 
volatility of short-term VIX futures is naturally higher than 
the volatility of further-away VIX futures, thereby generating 
a term-structure that is reflected in the computation of the 
VVIX index. This is not the case with the VIX index that 
is computed from options that are all written on the same 
underlying. 

In our model with constant vol of vol parameters, and as¬ 
suming a flat term-structure of volatility, the expected total 
variance of a VIX future of maturity can be evaluated to 
be: 

1 dVT 1 

J-i-tJt V„' 4^ V-^. 

’ denoted 

with ^ = f^a,/ 3 ^a(AT)^/ 3 (AT). The term-structure of the 
volatility of VIX futures is therefore decreasing - as we just 
mentionned, further-away VIX futures react less to news, 
and as such, are less volatile. A comparison with histori¬ 
cal data (the VVIX term-structure is made available by the 
Chicago Board Options Exchange) shows that our model 
under-estimates significantly the volatility of volatility em¬ 
bedded in the pricing of VIX options. This is expected. Sim¬ 
ilarly to the volatility risk premium, a volatility of volatility 
risk premium exists. Careful inspection also reveals that the 
term-structure is dependant on the level of vol of vol : with 
higher vol of vol (i.e. higher VVIX), the historical term- 


structure becomes steeper. This clearly shows a limitation of 
our approach: with our constant vol of vol parameters the 
term-structure of Eq. is decreasing but does not change 
through time. 

Going one step further, we compute the model value of 
the VVIX index in our model. Because of the decreasing 
term-structure and the interpolation methodology, the value 
depends on the maturity of the chosen expiries (again, this is 
not the case with the VIX index). In our model, the square 
of the expected VVIX index should be equal to: 

E ^9a9e ( 25 „+/ 3 (T 2 -t)- 

a, 13 

with Ti and T 2 the first and second expiries used in the inter¬ 
polation process and AT = T 2 — Ti ^ This implies an 
average VVIX value at around 75% that is significantly lower 
than the realized historical average (at around 86%), thereby 
implying that the vol of vol risk premium embedded in VIX 
options is significative. 


Eigure represents the ratio between the squared official 
VVIX index and our model. The ratio that we denote At 
appears to be mean-reverting towards a value that can be 



Eigure 8: Volatility of Volatility We display the ratio be¬ 
tween the square VVIX index and our theoritieal value, eom- 
puted with fixed vol of vol parameters. 

The dynamics of the ratio suggests that a mean-reverting 
process could be used to model the vol of vol randomness. 
Eor instance: 

1 \ 2 

= -kx{\og\t - (logAoo + ^))dt + axdW^ 

At ZKx 

with calibrated parameters 



kx 

hx 

crx 

Ca 

i^x 

126% 

16 

0 % 

152% 

0.78 

2.66 


and correlation with other factors 


P 

SZt 

SW,^' 


~5WP~ 

-60% 

56% 

59% 
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This would generate a term-structure of vol of vol equal to 

£'t[logA„] = logAoo + (log At - logAoo)e“''^(““*^ (16) 

Within this augmented framework, the instantaneous vari¬ 
ance would follow a diffusion equation: 





a=l 


At first-order in the vol of vol parameters, only minor ad¬ 
justments to our stochastic volatility framework would be re¬ 
quired - however, it would become less tractable. The con¬ 
vexity correction would be altered, so that with higher vol 
of vol, the convexity correction would be slightly increased. 
More importantly, the presence of stochastic vol of vol means 
that the total variance of a VIX future of maturity would 
reflect the mean-reverting behaviour of vol of vol. By using a 
second-order approximation derived from Eq. 


K = Et[Xu] 


^oof ^ ) 

^oo 


(u — t) 

e 2 


the total variance of a VIX future of maturity Ti would differ 


from eq. 15 


E 


^2 

Q^9^Q-^t-(ka,+k(3)Ti 

4(Ti - t) 



\U ^2 

Xoo 


The impact of vol of vol would be minimal for short- 
maturities, small mean-reversion rate kx^ or small level of 
vol of vol. In all other cases, the above integral would need 
to be evaluated numerically. As the vol of vol increases, the 
term-structure becomes steeper as expected from historical 
data. We keep for future work the integration of vol of vol 
information in the model. 


4 Consequences 


We pursue our exploration of the spot/vol properties by delv¬ 
ing into the theoritical implications of the non-linearities. We 
investigate the impact on the pricing and hedging of deriva¬ 
tives, an area of active research [nisiiniisiiii]- We first focus 
on standard vanilla options and review the link between skew¬ 
ness of an underlying and the implied skew (section [4.1[ ). We 
then scrutinize the relationship between implied skew and the 
evolution of the ATM implied volatility (section [4.2[ ). Finally, 
we look at some of the implications on the volatility of annu¬ 
alized variance (section 4.3). 


4.1 Skewness and Skew 

We investigate the relationship between skewness of returns 
and the skew Skew^^T of implied option smile. It is a 
well-known fact that the skewness of the returns generated 
by a model is linked to the implied skew of the options priced 
by the same model [2]. This is not surprising, since the skew¬ 
ness and the skew are both functions of the spot-volatility 
correlation. At first-order in vol of vol parameters, the rela- 
tionship can be expressed as Skewt,T = However, this 

expression is only true when the model is linear. When some 
non-linearities are present, e.g. through the function /q,, the 
equality breaks-down as pointed out in [21]. 

We investigate the relationship within the limits of our model. 
We verify that when non-linearities are ignored, i.e. assuming 
Uq, = 0, the equality is valid at first-order. The presence of 
non-linearities alters this relationship. However, in the case 
of equities where the linear spot/vol correlation dominates, 
the impact is negligeable. 


4.1.1 Skewness of returns 

The skewness of the returns can easily be computed from of 
the moments of order 2 and 3: 

Following the same derivation steps as in |3| , we find at first- 
order that the skewness can be expressed as: 


^ (E. 

Eu C E.<u Ct)E[SZudW:]SuSv 

A “ (E„W« 


(17) 


Assuming a relatively flat-term structure of volatility, the 
above equation further simplifies to 

C^4^+3VT^ty20aE[SZdW^]h{kc.{T-t)) 

VN ^ ' -.-' 


where h{x) is defined as h{x) = ^ ug{u)du = . 

The skewness of the returns for maturity T — t is the result 
of the intrinsic skewness of the spot process at time scale 6t, 
and of the spot-volatility correlation. With no vol of vol, i.e. 
Oa = 0^ the skewness is decreasing in as expected for 
a process with independant increments. This term quickly 
becomes negligeable. 


14 













4.1.2 Smile of volatility 


To investigate the impact of vol of vol on the implied smile of 
options, we introduce a scaling parameter A as ^ The 
parameter A controls the amount of stochastic volatility in the 
model. With no vol of vol, i.e. A = 0, our model has constant 
volatility equal to the var-swap volatility (Tys{t^ T) = 
and the implied volatility smile of options is flat. In the pres¬ 
ence of vol of vol, the shape of the implied volatility surface 
is altered, the ATM volatility shifts and the skew deviates 
from zero. This property is a well-known fact of stochastic 
volatility models, and has been quantified accurately in the 
case of linear model at second-order in recent work ISIS]. 


When non-linearities are present, the impact of stochastic 
parameters on the implied volatility is different, as pointed 
out in m in the case of a single-factor GARCH model. We 
conduct a similar analysis and compute the implied volatility 
smile in our augmented non-linear stochastic volatility model. 
At first-order in the strike K = St ^ dK, the volatility smile 
is approximated by: 

dK 

a^{K,t,T) « (TATM{St,t,T) + Skew^j, x — 

where is the Black-Scholes implied volatility ob¬ 
served at strike K. Note that t, T) = (7ys{t^ T) for all 

strikes K. The price of a call option Fk{X) = Et[{ST — K)^] 
of strike K is function of the vol of vol parameters through the 
parameter A. Pricing is achieved under the risk-neutral mea¬ 
sure, i.e. we assume that 5Zt and 5Wt are standard normal 
distributions, and we neglect the drift component. At first- 
order, the volatility shift implied by the presence of stochastic 
volatility at a specific strike K can be computed as: 

5a{K,t,T) = a^{K,t,T) - avs{t,T) = 

Vega^f 

where the vega is the standard Black-Scholes vega. We deduct 
that: 


• ATM Spread The ATM volatility is shifted by 
a\Sut,T) - avsit,T) = 


• Skew The skew generated by the stochastic volatility 
is equal to 


Skew^j^ = A 


F^jO) 

Vega^ 


Xfeg&sJ dK 


After some tedious computations (see app. |6.2| ), we find that 
the ratio can be expressed as; 


E' 


Eu krSuE.^JvL0-{u,v)^^Et[UA, + B,W)] 




where W is a standard Gaussian variables and Ay , By are 
defined as: 


Ay 

By 






1 

+ log 





The expression above is the same as the one derived in [2T]. 
This is not surprising as the exogenous volatility contribu¬ 
tions have no measurable impact on the expression of the 
implied smile or skew. From the definition of the function /q,, 
we find that the above expectation can be expressed as: 

EtlUAy + ByW)] = ay^iAl + B^ - 1) - bo^Ay 


Linear Case We first consider a linear model and ignore 
quadratic terms by simply assuming = 0. It is then 
straightforward to check that the expression Skewt,T = 
becomes valid at first-order (remember that for pricing we as¬ 
sume that C = 0 - we neglect the intrinsic skewness of the 
returns). In addition, one can express exactly the value of 
the ATM spread and skew implied by the linear model. For 
the sake of simplicity, we assume that the term-structure of 
variance is relatively fiat; we find that: 

Spreadliin = -Y,^Hka{T - t)){T - t)a^s{t,T) 

a 

Skew|ii„ = - y; - t)) 

a 

This is exactly the results derived in a more general set¬ 
tings in [5]. The presence of vol of vol decreases the ATM 
volatility and generates skew proportionally with the equality 

Spreadliin = Skew|iin. 

Non-Linear Case In the presence of non-linearities, the 
skew and the skewness are no longer directly related. In 
the case of flat term-structure, we can express the difference 
Skewt,T - as: 

E da^ah{kQ,(T t)) 

- - -CTvS V Ot 

a 

The impact of non-linearities is measured by the magnitude 

n 

of the ratio ^aysV^ ~ ^frcrysV^. In the case of the SPX 
index, the dominant factor remains the linear spot/vol cor¬ 
relation. Non-linear effects are negligeable and the ratio is 
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close to zero. However, when the correlation becomes small 
and/or non-linearities large, the difference and the ratio will 
become observable. This would be the case for assets where 
the smile is less steep and for which the spot/vol correlation 
is close to zero, such as FX assets. Note also that when the 
time frame of observation becomes small, i.e. St 0, we end 
up with the linear case. 

One can also compute the impact on the spread of ATM 
volatilities to find: 

ASpread = Spread |non-iin — Spread jun 

= 0aaah{ka{T - t)) - t)(7yg 

a ^ ^ 

which we find to be also negligeable. Although non-linearities 
would alter the relationship between skewness of the returns 
and skew of the implied volatility , the magnitude of the cor¬ 
rection is small and can safely be ignored. 

4.2 Skew-Stickiness ratio 

In recent work [4], Bergomi showed that two a-priori very 
distinct features of a volatility model, the static shape of the 
implied smile and the dynamics of the ATM volatility, were 
strongly linked. To measure their dependancy, he introduces 
the Skew-Stickiness Ratio R(t, T) as: 

rp\ ^ Et[SaATM{t,T)SSt] 

Skew (t, T ) Et [dS^ ] 

This ratio provides a quantitative interpretation to the notion 
of sticky-strike R = 1, sticky-delta R = 0, and local-vol strike 
R = 2. 


with g{x) = I—. Ignoring non-linearities (setting = 0), 
we therefore find that the skew-stickiness ratio can be ex¬ 
pressed as 

JD/. rTi\ _ ^aSag{ka{T — t) 

which is exactly the expression found by Bergomi in [4]. In 
the limit of small maturities, the skew-stickiness ratio con¬ 
verges to 2. This values makes perfect sense if one interprets 
the smile as the average over all paths of volatilities weighted 
by the gamma of the optiorj^ In the case of long-maturities, 
the ratio converges towards 1. 

Although non-linearities alter the value of the skew-stickiness 
ratio, we did not find the differences to be significative for 
the SPX index. 

4.3 Volatility of Variance Swap 

In this final section, we step into the world of volatility deriva¬ 
tives mm and study the volatility of the annualized variance. 
A variance swap provides an exposure to the aggregated an¬ 
nualized variance of the returns during a fixed period [Ti, T 2 ]. 
Most often, the returns are computed close to close, but other 
conventions exist. During the life of the trade Ti < t < T 2 , 
the annualised variance mark-to-market 

^ ^ J- 1 - 

i *^T2 -Ti ^ 

changes according to daily returns that increase the aggre¬ 
gated realised variance, but also due to changes in implied 
volatility corresponding to the remaining variance up to ma¬ 
turity. 


The correlation between the ATM volatility and the spot 
variations can be easily computed in our model: 

Et[5aATMit,T)SSt] ^ ^ dc.J^Qw»{t,u)Et[SW,-SZt] 
EtlSSf] A 

which in the case of fiat term-structure of volatility can be 
expressed as 


Using the additivity of variance, the variation of the annu¬ 
alised variance during time step St (corresponding 

to a day) can be written as the sum of two explicit terms: 


1 

AT 




accrued realised 


+ - Et[Sk]) 

■V'" ^ 

variation of implied variance 


(18) 


J2Y9(f^AT-t))E[5Zt5Wn 


where Dt = denotes the implied annualized variance up 

to maturit 3 pq This expression shows clearly that the total 


® We provide a simple intuitive proof. At time t, the volatility smile o-bs{K) observed for maturity t-\-2St can be approximated for strikes around 
the money by crBs(Ar = St^dK) = (7Bs('S't)+Skewt At time t, the local volatilities corresponding to the time intervals [t, t^St] and [t+(5t, t^2St] 

are denoted at and at^sti^t + dK) respectively. In the limit of small 6t and small dK, we must have (7gg(Ar = St + dK) = ^a^ + dK). 

The local volatility at-\-st{St + dK) represents the time-t expectation of the ATM volatility defined on the interval [t St, t-\- 2St]. By writting 
at-\-st{St + dK) = at-\-st{St) + R X Skewt^ with R an unknown coefficient representing the Skew-Stickiness Ratio, we then immediately find that 
R = 2. 

^°The instantaneous implied variance must verify ^^St = hSt — (T 2 — t)Et[Sh]- 
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volatility of the annualised variance, i.e. the square-root of 


Eti 



6y, 

'y 




depends on the covariance parameters Cla,p (through the 
second term), but also on the kurtosis k of the normalized 
returns 6Zt (through the first-term). This implies that, even 
under an idealised scenario with no vol of vol, i.e. = 0 , 

the discretisation of the returns generates some volatility. 
This contribution of discrete sampling to the total variance 
is well-known. 

A third contribution exists, although it has been less doc¬ 
umented. Large unexpected shocks, i.e. SZ^ » 1 , also 
contributes to the total volatility through their correlation 
with the implied volatility. This last-term is a direct con¬ 
sequence of non-linearities, with large unexpected shocks, 
often negative, being stronly correlated with implied vari¬ 
ance jumps. We denote by p^iocks correlation, i.e. 

P^hocks = Et[6W^ X Using the approximation de¬ 

fined in Eq. [E we can compute the correlation explicitly 
to be p^iocks ~ + 2 — 60 ,;^=^. From our estimated 

parameters, both correlations are of order 25%. Similarly to 
the volatility clustering effect, it is essentially the skewness of 
the variable 6Zt associated to the negative spot-vol correla¬ 
tion that is responsible for the magnitude of the correlation 
coefficients. 


Under our usual assumption of relatively flat term-structure 
of variance (see app. 6.3 for the derivations), the total vari¬ 
ance of a variance swap can be approximated as the sum of 
three-terms: 



Variance Swap Variance The expected variances of a vari¬ 
ance swap are plotted as a function of maturity. The variance 
is decomposed into three-terms reflecting the impact of non- 
linearities, of the discrete sampling, and of the remarking of 
implied parameters. We also display (black crosses) historical 
variances computed on our dataset. 

The pricing and hedging of volatility derivatives, such as 
volatility swaps, and options on volatility or variance, de¬ 
pends on the total variance that we have just calculated. As 
non-linearities contribute to increase the total variance, they 
would impact the pricing and hedging of derivatives. In this 
context, the three terms can be directly interpreted as the 
cost of gammas, the gamma on spot, the gamma on the re¬ 
marking of parameters, and the cross-gamma between both. 
However, it is important to realize that the daily hedging 
of a variance derivative using spot variance swaps of same 
maturity T 2 would hedge the three gammas at once. Said 
differently, as long as one knows the correct hedging vega, the 
three terms would be hedged at once. 


sampling impact 
implied parameters 

unexpected shocks 


Vat 

kg, AT) 

^\/ AT AT S a Pshocks^ Oih{ko, AT) 


with the functions defined by h{x) = 


x — l-\-e 


and l{x, y, z) = 
{x+y)z‘^ ). For small maturities, 
I and h{kaAT) ^ whereas for 


2 ’ 


^(-- 

xyz V 2; xz^ yz^ 

they verify l{k(^,kg, AT) ^ 

large maturities, ga,/ 3 {AT) « and h{kaAT) ^ ^cAt- 

Based on the estimated model parameters, we can compute 
the total variance qualitatively. As Fig |4.3| illustrates, all 
three terms have a significant impact, including non-linear 
effects. In fact, the impact of large unexpected shocks should 
not be overlooked, as its contribution for short-term variance 
swaps can be large (e.g. of the order 10% for 3-month swaps). 


As a final comment, we note that the term-structure of 
variance is rarely fiat. In practice, the presence of a slope 
should be integrated in the total variance (see app. 6.3). As 
a result, volatility derivatives, such as options on variance, 
would require additional variance swap hedges of intermediate 
maturities Ti < t < T 2 . 


5 Conclusion 

In this paper, we investigated some characteristics of spot and 
volatility from an empirical perspective. We summarise below 
our findings: 

1 . A Karhunen-Loeve decomposition of the variance curve 
deformations shows that the first two eigenmodes ac- 
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count for almost 99% of the variance, corroborating re¬ 
sults reported in m- A stochastic volatility model with 
only two factors is able to capture with a high degree of 
accuracy the daily variations of the variance curve up 
to half a year. 

2. The densities of the spot and volatility factors devi¬ 
ate markedly from normal well-behaved distributions. 
They exhibit significant skew, large excess kurtosis and 
fat tails, an known fact that has been frequently docu¬ 
mented [7]. The relationship between spot and volatil¬ 
ity is not linear; volatility is convex. In the case of 
the SPX index, accurate modeling can be achieved with 
a quadratic functional fa- As maturity increases, the 
magnitude of the non-linear component weakens. The 
fraction of the variance of variance unexplained by the 
functional relationship amounts to a third. 

3. The leverage correlation, which quantifies the correla¬ 
tion between a spot move today and tomorrow’s realised 
volatility, is accurately modeled by the term-structure 
of implied variance (Eq. and by the linear correlation 
between spot and volatility [IQIET]. 

4. The modeling of the volatility clustering, which mea¬ 
sures the correlation between today’s and tomorrow’s 
realised volatilities, requires higher-order non-linear ef¬ 
fects to capture the correlation E[6WtSZf]. We found 
that the non-linear component explains about a third of 
the volatility clustering and is more pronounced on the 
short-term. The skew of the spot factor combined with 


the linear spot/vol correlation explains the remaining 
two thirds. 

5. The volatility of volatility is itself volatile, and appears 
to follow a mean-reverting process with a half-life in¬ 
ferior to a month. The addition of a stochastic time- 
varying volatility of volatility may be an an interesting 
approach to generalize the current methodology and in¬ 
tegrate additional information provided by the VVIX 
index (and VIX options). 

6. We studied the impact of non-linearities on the dynam¬ 
ics of smiles. As first noted in [21], the volatility skew 
generated by non-linear models is in general different 
from the skewness of the underlying. In the case of the 
SPX index, for which the linear spot/vol correlation re¬ 
mains the dominant factor, the impact of non-linearities 
is negligeable and the skew-stickiness ratio defined in [4| 
is practically unchanged. Flatter and/or more-convex 
volatility smiles, such as the ones on the foreign ex¬ 
change market, could generate observable differences. 

7. Non-linearities contribute a significant part to the total 
volatility of the annualised variance, and as such, should 
be integrated in the pricing, modeling, and hedging of 
volatility derivatives. The convexity contribution is de¬ 
creasing more slowly than the discrete sampling impact, 
but both contributions quickly become smaller than the 
volatility generated by the remarking of the implied pa¬ 
rameters. However, we note that, as long as the correct 
vega is computed, hedging with spot variance will hedge 
the different contributions. 
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6 Appendix: Proofs 


By considering a small perturbation at first-order in the future instantaneous variance observed at a time v > t can 
be approximated from the previous value by the following formula: 


c=X 


\ 


t 


= er X 


denoted Xx'^-^v / 


(19) 


where the functions uj^ are evaluated in the unperturbed state {0^ = 0) with variances frozen at time x <t (see [H [21] for 
more details). Usually, the frozen time is taken to be either the start-date (i.e. x = 0), or the current time (i.e. x = t). By 
freezing the variances, the functions become deterministic with no stochastic components. 


6.1 Convexity correction 

Under the volatility model defined in Eq. the value of a VIX future can easily be computed at first-order in the covariance 
parameters We express the time-Ti term-structure of variance as a perturbation of the term-structure observed at time 

t by writing with x 0aUJ^{v, u, ^t)dW^. 

A second-order extension in the perturbation curve leads to: 


vT^ = 



1 1 


1 1 

Et 

V AT / 

= Et 

V W / i^t+mTi))du 


V JTi 


V JTi 


E- 




^ ^nTi)du 1 ^nTi)duy^ 

1 rT2 


^ ^ Iti ^ (at Iti 

* 8 (J<fi)3 ‘ \ Jt, 

2 


¥l^ - - 


kT^ X 


1 C Quj-(v, U, ^t)du) 

Wr 


1 - 


-L —fla a f dvp— f ^^oj°‘(v,u,^t)du X d— f ^?oj^{v,u,^t)di 

8(“<t )^f:^ -ft at A, ^ AT A, ^ 


convexity correction 
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6.2 Impact of the volatility of volatility on the implied smile 

The presence of vol of vol alters the shape of the implied volatility surface. We introduce a scaling parameter A as 6 ^^ ^ XOo, 
and consider the price of a call option Fx{X) = E[{St — K)~^] of strike K in the presence of vol of vol A 7 ^ 0 . Pricing is 
achieved under the risk-neutral measure, i.e. SZf and SWt follow standard normal distributions. We also neglect the drift 
component. 


The volatility shift at strike k induced by the presence of vol of vol is: 


Sa{K,t,T) = a\K,t,T)-ays{t,T) = 


FKiX) - Fk{0) ^ 


Vega^ 


Vega^ 


where Vega^ is the standard Black-Scholes vega. To compute Fk (0), we follow the same step as in [ 21 ]. We express the spot 
at maturity St as a function of an unperturbed state (A = 0) and a first-order correction: 


log ^ = E E - IcszX « ^ (v^sz^ - 

^ U ^ U ^ '' u ^ ^ 

^ E E y E from Eq. [T9] 


From the above, we have T(A) = E[{Ste 


.^at+AEo 


- K)+], so that F'(0) = k]. 


For simplicity, we drop the a-terms and define and the moneyness Mk = log ^ ^ computa¬ 

tion of the integral ^^(0) is painful and computationally intensive. It requires numerous changes of variables and partial 
integrations. We sketch the proof below. First, we express the integral ^^(0) as the sum of two integrals: 


— ^tE ZAre^^fLEAr>iog ^ 


St J 


= StE 




■ o-nSiE 


Ln-i^N 

^ Xn-1 


1 


\/^ 




= X [/(7V-l) + a^J(7V-l)] 


We then make use of the following equality J 2 ^^^(ax b) = to derive the recursive equality: 


I{k-1) =I{k-2)^ 




^k-i 


E n 2 

i=k — l 


--J{k-2) 


The integral J{k — 1) can be computed as: 


J(k - 1 ) = 


/ V 2 

-| 1 E(t?)2 

8Wu Ee " 

V Ei^„ <^i 

“v^ 


u<k 






iL<k 








We denote VcR the total Black-Scholes variance VcR = E„=o ~ It Putting everything together, we find that: 

da 


3M 

XTeg&K 


= E 


2yJ{T - t)V(aR 


E 


Svlo°={u,v)E 




- ^t^u 

VolR 


U) 
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we make the assumption of a 


6.3 Volatility of the annualised variance 

In order to derive the total variance of the annualized variance from eq. 

relatively flat-term structure of variance. We assume that the differences between impli"^ variance Dt and instantaneous 

variance for u > t should be of second-order (compared to the different integral terms involving the covariance parame¬ 
ters, kurtosis, and correlation terms). Although this approximation is rarely verified exactly, it does bring the advantage of 
obtaining an accurate closed-form solution. 


Under our assumption, the following approximation can be derived 

rT2 ^ r^2 


rl2 rl2 1 _ p-/Ca(T2-t) 

/ dQdu = V « Dt y 0 agc{T 2 - t)dw^ with g^{T 2 -t) = ——-- 

Jt Jt t) 


which applied to Eq. leads to the variation of the mark-to-market 

{szf - 5t) + y - t)dwr 


AT 


From there it is easy to compute the total variance. Our flat-term structure assumption means that the different volatility 
the ratios (see below) can be neglected without much impact on the final solution. 


E' 


Ti 





ETd{^HSZl-5u)^] + 
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AT3 


E EtA 

^ kakff Jti ^ 


r](l- 


_ ^-/Ca(T2-w)^^2 _ ^-k^{T2-u) 


)dv 
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2 _ Q—kcx^T 2 — 2 — ^—{kcxd-kp)AT 


kcikj^AT ^ AT 
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